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A Faster Pseudopolynomial Time Algorithm for Subset Sum 


Konstantinos Koiliaris* Chao Xu^ 


Abstract 

Given a multiset S' of n positive integers and a target integer t, the subset sum 
problem is to decide if there is a subset of S that sums up to t. We present a new 
divide-and-conquer algorithm that computes all the realizable subset sums up to an in¬ 
teger u in 0(min{ySru, O’}), where cr is the sum of all elements in S and O hides 
polylogarithmic factors. This result improves upon the standard dynamic programming 
algorithm that runs in 0(nu) time. To the best of our knowledge, the new algorithm is 
the fastest general deterministic algorithm for this problem. We also present a modified 
algorithm for finite cyclic groups, which computes all the realizable subset sums within 
the group in 0(min{-y/nm, time, where m is the order of the group. 


1. Introduction 

Given a multiset S oin positive integers and an integer target value t, the subset sum problem 
is to decide if there is a subset of S that sums to t. The subset sum problem is related to the 
knapsack problem [11] and it is one of Karp’s original NP-complete problems [25]. The subset 
sum is a fundamental problem used as a standard example of a problem that can be solved in 
weakly polynomial time in many undergraduate algorithms/complexity classes. As a weakly 
NP-complete problem, there is a standard pseudopolynomial time algorithm using a dynamic 
programming, due to Bellman, that solves it in 0{nt) time [2] (see also [9, Chapter 34.5]). The 
current state of the art has since been improved by a log t factor using a bit-packing technique 
[32]. There is extensive work on the subset sum problem, see Table 1.1 for a summary of 
previous deterministic pseudopolynomial time results [2, 33, 15, 31, 27, 32, 29, 37, 38]. 

Moreover, there are results on subset sum that depend on properties of the input, as well 
as data structures that maintain subset sums under standard operations. In particular, when 
the maximum value of any integer in S is relatively small compared to the number of elements 
n, and the target value t lies close to one-half the total sum of the elements, then one can 
solve the subset sum problem in almost linear time [16] . This was improved by Chaimovich 
[7]. Furthermore, Eppstein described a data structure which efficiently maintains all subset 
sums up to a given value u, under insertion and deletion of elements, in 0{u\ogu\ogn) time 
per update, which can be accelerated to O(ulogn) when additional information about future 
updates is known [14]. The probabilistic convolution tree, by Serang [37, 38], is also able to 
solve the subset sum problem in 0(nmax(S')) time, where O hides polylogarithmic factors. 

If randomization is allowed, more algorithms are possible. In particular, Bringmann 
showed a randomized algorithm that solves the problem in 0{nt) time, using only 0(nlog t) 
space under the Extended Riemann Hypothesis [4]. Bringmann also provided a random¬ 
ized near linear time algorithm 0{n + t) - it remains open whether this algorithm can be 
derandomized. 
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Result 

Time 

Space 

Comments 

Bellman [2] 

0{nt) 

0{t) 

original DP solution 

Pisinger [32] 


o(ijh) 

RAM model implementation of 
Bellman 

Pisinger [33] 

0(n max S) 

0{t) 

fast if small max S 

Faaland [15], 
Pferschy [31] 

O(n't) 

o{t) 

fast for small n' 

Klinz et al. 
[27] 

0 ( ct 3 / 2 ) 

o{t) 

fast for small cr, obtainable from 
above because n' = 0 {^/a) 

Eppstein [14], 
Serang [37, 38] 

0(n max S) 

0{t\ogt) 

data structure 

Lokshtanov et 
al. [29] 

O(n^t) 

0{n^) 

polynomial space 

current work 

0^min|-\/n't, 

Theorem 2.17 

0{t) 

see Section 1.2 


Table 1.1: Summary of deterministic pseudopolynomial time results on the subset sum problem. 
The input is a target number t and a multiset S of n numbers, with n' distinct values up to t, and a 
denotes the sum of all elements in S. 


Finally, it is unlikely that any subset sum algorithm runs in time for any 

constant c and e > 0, as such an algorithm would imply that there are faster algorithms for 
a wide variety of problems including set cover [4, 10] . 

1.1. Applications of the subset sum problem. 

The subset sum problem has a variety of applications including: power indices [42], scheduling 
[17, 34, 19], set-based queries in databases [41], breaking precise query protocols [12] and 
various other graph problems with cardinality constraints [6, 13, 5, 18, 27, 14] (for a survey 
of further applications see [26]). 

A faster pseudopolynomial time algorithm for the subset sum would imply faster poly¬ 
nomial time algorithms for a number of problems. The bottleneck graph partition problem 
on weighted graphs is one such example. It asks to split the vertices of the graph into two 
equal-sized sets such that the value of the bottleneck (maximum-weight) edge, over all edges 
across the cut, is minimized. The impact of our results on this problem and other selected 
applications is highlighted in Section 5. 

1.2. Our contributions. 

The new results are summarized in Table 1.2 - we consider the following all subset sums 
problem: Given a multiset 5 of n elements, with n' distinct values, with a being the total 
sum of its elements, compute all the realizable subset sums up to a prespecified integer u. 
Computing all subset sums for some u > t also answers the standard subset sum problem 
with target value t. 

Our main contribution is a new algorithm for computing the all subset sums problem 
in 0(min{-^/nM, O'}) time. The new algorithm improves over all previous work (see 

Table 1.2). To the best of our knowledge, it is the fastest general deterministic pseudopoly- 
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Parameters 

Previous best 

Current work 

n and t 

Ointj logt) 

0(min{-ynt, 

n! and t 

0{n't) 

0^min|\/n't, 

a 

0(u3/2) 

0(cj) 


Table 1.2: Our contribution on the subset sum problem compared to the previous best known results. 
The input 5" is a multiset of n numbers with n' distinct values, a denotes the sum of all elements in 
S and t is the target number. 


nomial time algorithm for the all subset sum problem, and consequently, for the subset sum 
problem. 

Our second contribution is an algorithm that solves the all subset sums problem modulo 
m, in ©(minlyTim, log^ m) time. Though the time bound is superficially similar to 

the first algorithm, this algorithm uses a significantly different approach. 

Both algorithms can be augmented to return the solution; i.e., the subset summing up 
to each number, with a polylogarithmic slowdown (see Section 4 for details). 

1.3. Sketch of techniques. 

The straightforward divide-and-conquer algorithm for solving the subset sum problem [23], 
partitions the set of numbers into two sets, recursively computes their subset sums and 
combines them together using FFT [14, 37, 38] (Fast Fourier Transform [9, Chapter 30]). 
This algorithm has a running time of 0{aloga\ogn). 

Sketch of the first algorithm (on integers). Our main new idea is to improve the 
“conquer” step by taking advantage of the structure of the sets. In particular, if S and T 
lie in a short interval, then one can combine their subset sums quickly, due to their special 
structure. On the other hand, if S and T lie in a long interval, but the smallest number of 
the interval is large, then one can combine their subset sums quickly by ignoring most of the 
sums that exceed the upper bound. 

The new algorithm works by first partitioning the interval |0 : uj into a logarithmic 
number of exponentially long intervals. Then computes these partial sums recursively and 
combines them together by aggressively deploying the above observation. 

Sketch of the second algorithm (modulo m). Assume m is a prime number. Using 
known results from number theory, we show that for any i one can partition the input set into 
0{\S\/l) subsets, such that every such subset is contained in an arithmetic progression of the 
form x,2x,... ,lx. The subset sums for such a set can be quickly computed by dividing and 
later multiplying the numbers by i. Then combine all these subset sums to get the result. 

Sadly, m is not always prime. Fortunately, all the numbers that are relative prime to m 
can be handled in the same way as above. For the remaining numbers we use a recursive 
partition classifying each number, in a sieve-like process, according to which prime factors 
it shares with m. In the resulting subproblems all the numbers are coprime to the moduli 
used, and as such the above algorithm can be used. Finally, the algorithm combines the 
subset sums of the subproblems. 

Paper organization. Section 2 covers the algorithm for positive integers. Section 3 de¬ 
scribes the algorithm for the case of modulo m. Section 4 shows how we can recover the 
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subsets summing to each set, and Section 5 presents the impact of the results on selected 
applications of the problem. 

2. The algorithm for integers 

2.1. Notations. 

Let [x : yj = {x, x + 1,..., y} denote the set of integers in the interval [x,y]. Similarly, 
H = [1 : xj. For two sets X and y, we denote hy X(BY the set {x + y | x € X and y gY}. 
If X and Y are sets of points in the plane, X © y is the set {(xi + yi,X 2 + y 2 ) | xi,X 2 G 
X and yi,y 2 GY}. 

For an element s in a multiset S, its multiplicity in S is denoted by 15 ( 5 ). We denote 
by set(S') the set of distinct elements appearing in the multiset S. The size of a multiset S 
is the number of distinct elements in S (i.e., |set(5)|). The cardinality of S, is card(S') = 
ZlseS 15 ( 5 ). We denote that a multiset S has all its elements in the interval [[x : yj by 
S' C |x : yl- 

For a multiset S of integers, let = XlseS ^ denote the total sum of the elements 

of S. The set of all subset sums is denoted by 

J2iS) = {Sr I T C S} . 

The pair of the set of all subset sums using sets of size at most a along with their associated 
cardinality is denoted by [S] = {(Sr, ITI) | T C S, |T| < a}. The set of all subset 
sums of a set S up to a number u is denoted by 5Z<u('S') = ^{S) n |0 : n]. 

2.2. Prom multisets to sets. 

Here, we show that the case where the input is a multiset can be reduced to the case of a 
set. The reduction idea is somewhat standard (see [26, Section 7.1.1]), and first appeared in 
[28]. We present it here for completeness. 

Lemma 2.1. Given a multiset S of integers, and a number s G S, with 15 ( 5 ) > 3. Consider 
the multiset S' resulting from removing two copies of s from S, and adding the number 2s 
to it. Then, = 5Z<u(‘5'0- Observe that card(S") = card(5) — 1. 

Proof: Consider any multiset T G S. If T contains two or more copies of s, then replace two 
copies by a single copy of 2s. The resulting subset is T' C S', and = S^/, establishing 
the claim. ■ 

Lemma 2.2. Given a multiset S of integers in [m] of cardinality n with n' unique values, 
one can compute, in 0{n' \og^ u) time, a multiset T, such that: (i) ^<.^(*5') = 

(a) card(r) < card(S'), 

(Hi) card(r) = 0(n'log n), and 

(iv) no element in T has multiplicity exceeding two. 

Proof: Copy the elements of S into a working multiset X. Maintain the elements of set(X) 
in a heap D, and let T initially be the empty set. In each iteration, extract the minimum 
element x from the heap D. If x > u, we stop. 

If lx(x) < 2, then delete x from X, and add x, with its appropriate multiplicity, to the 
output multiset T, and continue to the next iteration. 

If lx(x) > 2, then delete x from X, add x to the output set T (with multiplicity one), 
insert the number 2x into X with multiplicity m' = [(lx(x) — l)/2j, (updating also the heap 
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D ~ hy adding 2x if it is not already in it), and set tx{x) '^x{x) — 2m!. The algorithm 
now continues to the next iteration. 

At any point in time, we have that ~ ^ T), and every iteration takes 

0(log u) time, and and as such overall, the running time is 0(card(r) log u), as each iteration 
increases card(T) by at most two. Finally, notice that every element in T is of the form 
2*x,x G S for some i, where i < logn, and thus card(T) = 0(n'log rt). ■ 

Note that the following lemma refers to sets. 

Lemma 2.3. Given two sets iS, T C |0 : «], one can compute S' 0 T in O(nlogn) time. 

Proof: Let fs{x) = Ylies characteristic polynomial of S. Construct, in a similar 

fashion, the polynomial fx and let g = fs * It- Observe that the coefficient of x* in g is 
greater than 0 if and only if i G S'0r. As such, using FFT, one can compute the polynomial 
g in 0{ulogu) time, and extract 5 0 T from it. ■ 

Observation 2.4. If P and Q form a partition of multiset S, then X^(S') = 

Combining all of the above together, we can now state the following lemma which sim¬ 
plifies the upcoming analysis. 

Lemma 2.5. Given an algorithm that computes in T(n, n) = n(nlog^n) time, for 

any set S C [nj with n elements, then one can compute multiset S' C [n]], 

with n' distinct elements, in 0(T(n'log n, n)) time. 

Proof: First, from S, compute the multiset T as described in Lemma 2.2, in O(ulog^u) 
time. As every element in T appears at most twice, partition it into two sets P and Q. 
Then 0 S<u(Q)) O |0 : rt], which is computed using Lemma 2.3, in 

O(ulogu) time. This reduces all subset sums for multisets of n' distinct elements to two 
instances of all subset sums for sets of size 0{n' log u). ■ 

2.3. The input is a set of positive integers. 

In the previous section it was shown that there is little loss in generality and running time 
if the input is restricted to sets instead of multisets. For simplicity of exposition, we assume 
the input is a set from here on. 

Here, we present the main algorithm: At a high level it uses a geometric partitioning 
on the input range |0 : u] to split the numbers into groups of exponentially long intervals. 
Each of these groups is then processed separately abusing their interval range that bounds 
the cardinality of the sets from that group. 

Observation 2.6. Let g be a positive, superadditive (i.e. g{x + y) > g{x) 0 g{y),\lx,y) 
function. For a function f{n,m) satisfying 

f{n,m)= max {/( ^, mi) 0/( ^, m 2 ) 0 5 (m)| , 

we have that f{n,m) = O {g{m)logn). 

Theorem 2.7. Given a set of positive integers S with total sum a, one can compute the set 
of all subset sums ^{S) in 0(ct logcrlogn) time. 

Proof: Partition S into two sets L, R of (roughly) equal cardinality, and compute recursively 
L' = ^{L) and R' = X^(i?). Next, compute ^{S) = L' (B R' using Lemma 2.3. The recur¬ 
rence for the running time is f{n,a) = maXo-^+o- 2 =o-{/(n/ 2 ,cji) 0 /(n/ 2 ,cj 2 ) 0 0{aloga)}, 
and the solution to this recurrence, by Observation 2.6, is 0 (it log u log n). ■ 
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Remark 2.8. The standard divide-and-conquer algorithm of Theorem 2.7 was already known 
in [38, 14], here we showed a better analysis. Note, that the basic divide-and-conquer algo¬ 
rithm without the FFT addition was known much earlier [23]. 

Lemma 2.9 ([38, 14]). Given a set S C |A]] of size n, one ean compute the set ^{S) in 
0(nA log(nA) log n) time. 

Proof: Observe that S 5 < An and apply Theorem 2.7. ■ 

Lemma 2.10. Given two sets of points S', T C |0 : u] x |0 : uj, one can compute S' © T m 
0(uvlog{uv)) time. 

Proof: Let fs{x,y) = Yl{i j)es characteristic polynomial of S. Construct, sim¬ 

ilarly, the polynomial fx, and let g = fs * fr- Note that the coefficient of is greater 
than 0 if and only if (i, j) G S' © T. One can compute the polynomial g hy a straightforward 
reduction to regular FFT (see multidimensional FFT [3, Chapter 12.8]), in 0{uvloguv) time, 
and extract S' © T from it. ■ 

Lemma 2.11. Given two disjoint sets B,C C |x : x + f] and [B], [C], one can 

compute [B U C] in O log{£a)) time. 

Proof: Consider the function f{{i,j)) = (i — xj,j). Let X = f [B]^ and Y = 

f [C'])- If (b j) G [B] U X)-" [C], then i = jx + y for y € |0 : ijj. Hence 

X,y C |0 : eaj X |0 : aj. 

Computing X (B Y using the algorithm of Lemma 2.10 can be done in O (fa^ log(t’a)) 
time. Let Z = (A © T) 0 ([[0 : la\ x |0 : aJ). The set [B U C] is then precisely f~^{Z). 
Projecting Z back takes an additional O log(£Q:)) time. ■ 

Lemma 2.12. Given a set S C |x : x + of size n, computing the set [‘5’] takes 

0 (^ 0 ^ log(£a!) log n) time. 

Proof: Compute the median of S', denoted by 5, in linear time. Next, partition S into two 

sets L = 5 n [[h] and R = 5n[(5 + l:x + £]. Compute recursively L' = [B] and 

R' = [i?], and combine them into [LU R] using Lemma 2.11. The recurrence for 

the running time is: 

finj) = max^|/Q, 4 ^ + 0[£o?‘'^og{la))^ , 

which takes 0[ia‘^ log{ia) log n) time, by Observation 2.6. ■ 

Lemma 2.13. Given a set S C |x : x + of size n, computing the set takes 

O [{u/x^i log{iu/x) log n) time. 

Proof: Apply Lemma 2.12 by setting a = [u/xj to get [•S']- Projecting down by ignoring 

the last coordinate and then intersecting with |0 : n] gives the set ■ 

Lemma 2.14. Given a set S C |tt]] of size n and a parameter tq > 1, partition S as follows: 

• S'o = S' n |ro], and 

• for i > 0, Si = S n |ri_i + 1 : rj], where ri = |_2VoJ. 

The resulting partition is composed of u = O(logu) sets S'o, Si,... ,5^, and can be computed 
in 0{n log n) time. 
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Proof: Sort the numbers in S, and throw them into the sets, in the obvious fashion. As for 
the number of sets, observe that 2Vo > u when i > logu. As such, after logn sets, > uM 

Lemma 2.15. Given a set S C {uj of size n. For i = 0,... = 0{logu), let Si be the ith 

set in the above partition and let |Sjl = n*. One can compute b overall 

O {{u^/rQ + min{ro, n}ro) log^ u) time. 

Proof: Because 5 C [u], n = 0{u). If i = 0, then Sq C [[ro|, and one can compute 5Z<„(<S'o), 
in O(noro log(noro) log no) time, using Lemma 2.9. Since no < ro and no < n, this simplifies 
to O (min{n, rojro log^ u). 

For i > 0, the sets Si contain numbers at least as large as rj_i. Moreover, each set Si is 
contained in an interval of length ii = ri — rj_i = rj_i. Now, using Lemma 2.13, one can 
compute Z]<«('S'i) in O {{u/ri_i)'^iilog{iiu/ri_i) log Ui) = 0(^^log^n^ time. Summing 

this bound, for i = 1,... , n, results in O log^ running time. ■ 

Theorem 2.16. Let S C |tt]] be a set of n elements. Computing the set of all subset sums 

^<^(5) taA;es O (min{-y/nn, log^ n) time. 

Proof: Assuming the partition of Lemma 2.14, compute the subset sums = E<«('S'i), for 
z = 0,..., n. Let Pi = Ti, and let Pi = {Pi-i © Ti) n |n]. Each Pi can be computed using 
the algorithm of Lemma 2.3. Do this for i = 1,..., n, and observe that the running time to 
compute Pi/, given all Ti, is 0{iy{ulogu)) = 0{ulog^ u). 

Finally, for all i = 1,..., calculating the Tfs: 

• By setting vq equal to and using Lemma 2.15 takes O log^ u). 

• By setting ro equal to and using Lemma 2.15 takes O (-y/rett log^ u). 

Taking the minimum of these two, proves the theorem. ■ 

Putting together Theorem 2.7, Theorem 2.16 and Lemma 2.5, results in the following 
when the input is a multiset. 

Theorem 2.17 (Main theorem). Let S C be a multiset of n' distinct elements, with 
total sum a, computing the set of all subset sums $^<,^(*5') takes 

O^min rilog 2 u, log^ u, a log cr log [n' logtt) 

time. 


3. Subset sums for finite cyclic groups 

In this section, we demonstrate the robustness of the idea underlying the algorithm of Section 
2 by showing how to extend it to work for finite cyclic groups. The challenge is that the 
previous algorithm throws away many sums that fall outside of {u} during its execution, but 
this can no longer be done for finite cyclic groups, since these sums stay in the group and as 
such must be accounted for. 

3.1. Notations. 

For any positive integer m, the set of integers modulo m with the operation of addition forms 
a finite cyclic group, the group = {0,1,... , m — 1} of order m. Every finite cyclic group 
of order m is isomorphic to the group (as such it is sufficient for our purposes to work 
with J-m)- Let = {x G J-m I gcd(x,m) = 1} be the set of units of Z^, and let 

Euler’s totient function cp(m) = \U{'Zm)\ be the number of units of Z^- We remind the 
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reader that two integers a and /3 such that gcd(a,/3) = 1 are coprime (or relatively prime). 
The set 


={x, 2x,... 


is a finite arithmetic progression, henceforth referred to as a segment of length |xIt'll = i. 
Finally, let S/x = {s/x | s G S' and x \ s} and S%x = {s G S' | x f s}, where x | s and x \ s 
denote that “s divides g” and “s does not divide q'\ respectively. For an integer x, let cto(x) 
denote the number of divisors of x and ai{x) the sum of its divisors. 


3.2. Subset sums and segments. 

Lemma 3.1. For a set S C Z^, of size n, such that S C x\i\, the set ^(S") can be computed 
in O (ntlog(nt) log re) time. 

Proof: All elements of x\i\ are multiplicities of x, and thus S' := S/x C |t] is a well defined 
set of integers. Next, compute ™ 0(retlog(ret) logre) time using the algorithm of 

Lemma 2.9 (over the integers). Finally, compute the set {ax (mod m) | re G X^(S'')} = X^(S') 
in linear time. ■ 

Lemma 3.2. Let S C be a set of size re covered by segments xi|£]],... ,Xfc|^]], formally 
S C then the set X^(S) core be computed in 0{km\ogm + n^\og{n()\ogn) time. 

Proof: Partition, in 0{kn) time, the elements of S into k sets Si,..., Sk, such that Si C Xj|f], 
for i G |A:]. Next, compute the subset sums T* = ^{Si) using the algorithm of Lemma 3.1, 
for i G l/cj. Then, compute Ti 0 T 2 © ... © Tfc = by A: — 1 applications of Lemma 2.3. 

The resulting running time is 0[{k — l)m log rre + |S'j|£log(|S'j|£) log |5j|) = 0(A:m log m + 

reAlog(reA) logre). ■ 


3.3. Covering a subset of tf(Zm) by segments. 

Somewhat surprisingly, one can always find a short but “heavy” segment. 

Lemma 3.3. Let S F U = U(Zm), there exists a constant c, for any I such that c2 in in m < 
I < m there exists an element x G C/ such that \x\L\ n S'! = (^ IS"!). 


Proof: Fix a /3 £ U. For f G t/ n |1] consider the modular equation ix = /3 (mod m), this 
equation has a unique solution x £ Lf - here we are using the property that i and /3 are 
coprime to m. Let a = \U\/2m. Let oj{m) be the number of distinct prime factors of m, and 


9{m) = be the number of distinct square-free divisors of m. Then 6{m) < c2 imnm < a£ 

[35]. There are at least 2ai — 6{m) > ai elements in C n |1] [40, Equation (1.4)]. 

Hence, when (d £ U \s fixed, the number of values of x such that [3 £ x[l\ is at least 
aL Namely, every element of 5 C [7 is covered by at least ai segments {x|f] | x G t/}. As 
such, for a random x £ U the expected number of elements of S that are contained in x|f]] 
is (|5| ai) /\U\ = -^ IS*]. Therefore, there must be a choice of x such that |x|f]] n 5] is larger 
than the average, implying the claim. ■ 


One can always find a small number of segments of length i that contain all the elements 
of U{Zm). 

Lemma 3.4. Let S C U{Zm) of size n, then for any i such that i > rre^/^ there is a collection 
L o/0(Ylnre) segments, each of length i, such that S C (Jj-g^xlf]]. Furthermore, such a 
cover can be computed in 0[{n + logm)f) time. 




Proof: Consider the set system defined by the ground set and the sets {x\(\ \ x (Zm)}- 
Next, consider the standard greedy set cover algorithm [24, 39, 30]: Pick a segment x\P^ such 
that |xp]n5| is maximized, remove all elements of S covered by x[f\^ add x[[t']| to the cover, 
and repeat. By Lemma 3.3, there is a choice of x such that the segment x|£| contains at least 
St, cijm fraction of S, for some constant c. After m/c£ iterations of this process, there will 
be at most (1 — n < re/e elements remaining. As such, after 0(^111 re) iterations 

the original set S is covered. 

To implement this efficiently, in the preprocessing stage compute the modular inverses of 
every element in |£] using the extended Euclidean algorithm, in 0{ilogm) time [9, Section 
31.2]. Then, for every b & S and every i G |.^], find the unique x (if it exists) such that 
ix = b (mod rre), using the inverse in 0(1) time. This indicates that b is in x\l\ n S. 
Now, the algorithm computes x\l\ n 5, for all x, in time 0{ni + i\ogm). Next, feed the 
sets x\l\ n 5, for all x, to a linear time greedy set cover algorithm and return the desired 
segments in 0{n() time [9, Section 35.3]. The total running time is 0((re + logrre)^). ■ 

3.4. Subset sums when all numbers are coprime to rre. 

Lemma 3.5. Let S C Uifl-m) be a set of size re. Computing the set of all subset sums ^{S) 
takes O (min | ^/nm, rre^O | log rre log re) time. 

Proof: If j5j > 2y/m, then '^[S) = Z^ [20, Theorem 1.1]. As such, the case where re = 
jS] > 2-y/rre is immediate. 

For the case that re < 2y/m we do the following. Apply the algorithm of Lemma 3.4 for 
£ = mj^fn > m}l‘^. This results in a cover of S by 0{Jj log re) segments (each of length 
which takes 0((re+logrre) £) = 0{y/nm\ogm) time. Next, apply the algorithm of Lemma 3.2 
to compute ^{S) in 0{nilog{ni) log re) = 0(v^rre log rre log re) time. Since, re = 0{y/m) this 
running time is O (min | ^/nm, } log rre log re). ■ 

3.5. The algorithm: Input is a subset of Z^- 

In this section, we show how to tackle the general case when S' is a subset of Z^- 

3.5.1. Algorithm. 

The input instance is a triple (r,|U,r), where T is a set, p, its modulus and r an auxiliary 
parameter. For such an instance (T, p, r) the algorithm computes the set of all subset sums 
of T modulo p. The initial instance is 

Let q be the smallest prime factor of r, referred to as pivot. Partition P into the two sets: 

T/q = [s/q I s € P and q \ s} and T%q = {s G P | g f s} . 

Recursively compute the (partial) subset sums instances 

(P/q, p/q,T/q) and {T%tq, p,T/q), respectively. Then compute the set of all subset sums 
E(r) = {qx I X G ® X)(r%9) by combining them together using Lemma 2.3. At 

the bottom of the recursion, when r = 1, for each set compute its subset sums, using the 
algorithm of Lemma 3.5. 

3.5.2. Handling multiplicities. 

During the execution of the algorithm there is a natural tree formed by the recursion. Con¬ 
sider an instance (P, p, r) such that the pivot q divides r (and p) with multiplicity r. The 
top level recursion would generate instances with sets T/q and T%q. In the next level, T/g is 
partitioned into T/q'^ and {T/q)%tq. On the other side of the recursion r%q gets partitioned 
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(naively) into (J'%q)/q (which is an empty set) and iT%q)%q = T%q. As such, this is a 
superfluous step and can be skipped. Hence, compressing the r levels of the recursion for 
this instance results in r + 1 instances: 

r%g, {T/q)%q,...,{T/q^-^)7oq, W- 

The total size of these sets is equal to the size of T. In particular, compress this subtree 
into a single level of recursion with the original call having r + 1 children. At each such 
level of the tree label the edges by 0,1, 2,... , r, based on the multiplicity of the divisor of 
the resulting (node) instance (i.e., an edge between instance sets T and (r/g ^)%(7 would be 
labeled by “2”). 

3.5.3. Analysis. 

The recursion tree formed by the execution of the algorithm has a level for each of the 
k = 0(logm/loglogm) distinct prime factors of m [35] - assume the root level is the 0th 
level. 

Lemma 3.6. Consider running the algorithm on input (S,m,m). Then the values of the 
moduli at the leaves of the recursion tree are unique, and are precisely the divisors of m. 

Proof: Let m = 0^=1 prime factorization of m, where qi < gj+i for all 1 < f < fc. 

Then every vector x = (xi,... ,Xk), with 0 < x* < r*, defines a path from the root to a 
leaf of modulus 0^=1 9^* ™ natural way: Starting at the root, at each level of the 
tree follow the edge labeled Xj. If for two vectors x and y there is an i G [fc] such that 
Xi 7 Vi: then the two paths they define will be different (starting at the ith level). And, 
by the unique factorization of integers, the values of the moduli at the two leaves will also 
be different. Finally, note that every divisor of m, 0^=1 Qi' with 0 < p* < r*, occurs as a 
modulus of a leaf, and can be reached by following the path (ri — pi,..., — Pk) down the 

tree. ■ 

Theorem 3.7. Let S C he a set of size n. Computing the set of all subset sums ^{S) 
takes 0(min I yTim, log^ m) time. 

Proof: The algorithm is described in Section 3.5.1, when the input is We break 

down the running time analysis into two parts: The running time at the leaves, and the 
running time at internal nodes. 

Let 6 be the number of leaves of the recursion tree. Arrange them so the modulus of the 
ith. leaf, p*, is the ith largest divisor of m. Note that p, is at most m/i, for all i G [d]. Using 
Lemma 3.5, the running time is bounded by 

5 \ / 5 

min I y/rTi p*, | log n* log p* | = O I log m log n min 

i=i J \ i=l 

Using Cauchy-Schwartz, the first sum of the min is bounded by 



and the second by 0(m®/^). Putting it all together, the total work done at the leaves is 
O (minj-y/nm, log m log n). 

Next, consider an internal node of modulus p, pivot q and r + 1 children. The algorithm 
combines these instances, by applying r times Lemma 2.3. The total running time necessary 
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for this process is described next. As the moduli of the instances decrease geometrically, pair 
up the two smallest instances, combine them together, and in turn combine the result with 
the next (third) smallest instance, and so on. This yields a running time of 

At the leaf level, by Lemma 3.6, the sum of the moduli Yli=i equals to cri(m), and it is 
known that cri{m) = O(mloglogm) [21, Theorem 323]. As such, the sum of the moduli of 
all internal nodes is bounded by 0{kmloglogm) = O(mlogm), as the sum of each level is 
bounded by the sum at the leaf level, and there are k levels. As each internal node, with 
modulus /X, takes 0(/xlog/x) time and x\ogx is a convex function, the total running time 
spent on all internal nodes is 0{m\ogm\og{m\ogmj) = O(mlog^m). 

Aggregating everything together, the complete running time of the algorithm is bounded 
by 0(min j-y/nm, log^ m), implying the theorem. ■ 

The results of this section, along with the analysis of the recursion tree above, conclude 
the following corollary on covering with a small number of segments. The result is useful 
for error correction codes, and improves the recent bound of Chen et al. by a factor of y/I 
[ 8 ]. 

Corollary 3.8. There exist a eonstant c, for all I sueh that c2ininm < i < m, one can cover 
J-m with O ((cJi(m) lnm)/£) + ao{m) segments of length L Furthermore, sueh a eover ean he 
eomputed in 0{mi) time. 

Proof: Let S^/d = {x/{m/d) \ x G and gcd(x,m) = m/d}, for all d \ m. Note that 
Sm/d = U{/I-d), hence by Lemma 3.4, each Sm/d has a cover of 0{{d\iid)/tj segments. Next, 
“lift” the segments of each set back up to Z^, (by multiplying hy m/d) forming a cover 
of J-m- The number of segments in the final cover is bounded by 

sr^^fd, \ ^ „/cri(m)lnm\ , , 

-Inmj + ^1 = Of- j -j +c7o(m) . 

d\m d|m 

i<d i>d 

The time to cover each by Lemma 3.4, is 0((n + logm) i) = 0[{ip{d) + log d) i) , since 

there are if{d) elements in Sm/d^ S^/d ^ Also, if{d) dominates logd, as 0[ip{d)) = 
Q{d/ loglogd) [21, Theorem 328], therefore the running time simplifies to 0{^{d)tj. Sum¬ 
ming over all 5^/^ we have 

'^0{ip{d)t) =OUY^ Lp{d) j = 0(m£) , 

d\m \ d\m j 

since Ylid\mTi^) ~ 16.2], implying the corollary. ■ 

If < c2 In In m ^ then I = m°^^\ The corollary above then shows that for all £, there is a 
cover of Z^ with /t segments. 

4. Recovering the solution 

Given sets X and Y, a number x is a witness for i^X(BY,ifx£X and i — x £ Y. A 
function w : X (BY — >-Aisa witness function, if w{i) is a witness of i. 

If one can find a witness function for each X ®Y computation of the algorithm, then 
we can traceback the recursion tree and reconstruct the subset that sums up to t in 0(n) 
time. The problem of hnding a witness function quickly can be reduced to the reconstruction 
problem defined next. 
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4.1. Reduction to the reconstruction problem. 

In the reconstruction problem, there are hidden sets C [m] and we have two 

oracles Size and Sum that take as input a query set Q. 

• Size(Q) returns the size of each intersection: 

(|5ing|,|52nQ|,...,|5„nQ|) 

• SuM(g) returns the sum of elements in each intersection: 

( E E "’•••’ E 

\seSinQ seS2nQ seSnHQ ) 

The reconstruction problem asks to find n values xi,..., such that for all i, if Si is non¬ 
empty, Xi G Si- Let / be the running time of calling the oracles, and assume / = VL{m + n), 
then is it known that one can find xi,... ,Xn in O (/log n poly log m) time [1]. 

If X,Y C |u]], finding the witness of X © T is just a reconstruction problem. Here the 
hidden sets are Wq, ..., W 2 u Y |2tt]], where Wi = {x \ x + y = i and x G X, y G T} is the 
set of witnesses of i. Next, define the polynomials Xq(x) = Iq{x) = 

The coefficient for x® in xqXy is |ILing| and in IqXy is XlseWinQ which are precisely the 
ith coordinate of SiZE(g) and SuM(g), respectively. Hence, the oracles can be implemented 
using polynomial multiplication, in 0{u) time per call. This yields an 0{u) time deterministic 
algorithm to compute X (BY with its witness function. 

Hence, with a polylogarithmic slowdown, we can find a witness function every time we 
perform a ® operation, thus, effectively, maintaining which subsets sum up to which sum. 

5. Applications and extensions 

Since every algorithm that uses subset sum as a subroutine can beneht from the new algo¬ 
rithm, we only highlight certain selected applications and some interesting extensions. Most 
of these applications are derived directly from the divide-and-conquer approach. 

5.1. Bottleneck graph partition. 

Let G = {V, E) be a graph with n vertices m edges and let re : —)• R'*' be a weight 

function on the edges. The bottleneck graph partition problem is to split the vertices into 
two equal-sized sets such that the value of the bottleneck (maximum-weight) edge, over 
all edges across the cut, is minimized. This is the simplest example of a graph partition 
problem with cardinality constraints. The standard divide-and-conquer algorithm reduces 
this problem to solving O(logn) subset sum problems: Pick a weight, delete all edges with 
smaller weight and decide if there exists an arrangement of components that satisfy the size 
requirement [22]. The integers being summed are the various sizes of the components, the 
target value is n/2, and the sum of all inputs is n. Previously, using the 0(cr^/^) algorithm 
by Klinz and Woeginger, the best known running time was 0{m + log re) [27]. Using 
Theorem 2.7, this is improved to 0{m) + 0(n) time. 

5.2. All subset sums with cardinality information. 

Let S = {si, S 2 , • • •, Sn}- Dehne ^<”(5') to be the set of pairs (f,j), such that {i,j) G 
S<m('S') if and only if f < re,j < re and there exists a subset of size j in S that sums up to 
i. We are interested in computing the set 5^<u(5'). 
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We are only aware of a folklore dynamic programming algorithm for this problem that 
runs in 0{v?u) time. We include it here for completion. Let D[i,j, k] be true if and only if 
there exists a subset of size j that sums to i using the hrst k elements. The recursive relation 
is 


D[i,j, k] 


true 
< false 


if i = j = /c = 0 
if i > j = fe = 0 


L>[i, j, A: — 1] V D[i — Sk,j — 1, A: — 1] otherwise 

where we want to compute D[i,j,n] for all z < u and j < n. In the following we show how 
to do (significantly) better. 

Theorem 5.1. Let S C [u]] be a set of size n, then one can compute the set 
O {nu log(nM) log n) time. 


Proof: Partition S into two (roughly) equally sized sets 5i and 82 . Find l^<”^^(<S'i) and 
^^n/ 2 (^^) combiue them using Lemma 2.10, in 0(nnlog(nu)) time. The 

final running time is then given by Observation 2.6. ■ 


5.3. Counting and power index. 

Here we show that the standard divide-and-conquer algorithm can also answer the counting 
version of all subset sums. Namely, computing the function Nu^six): the number of subsets 
of S that sum up to x, where x < u. 

For two functions f,g:X^Y, define f Q g : X ^ Y to be 

if © g)ix) = fix)gix - t) 
t&x 

Corollary 5.2. Given two funetions /, s' : |0 : uj —>■ IKI such that fix),gix) < b for all x, 
one ean compute f Q g in 0 {ulogulogb) time. 

Proof: This is an immediate extension of Lemma 2.10 using the fact that multiplication of 
two degree u polynomials, with coefficient size at most b, takes 0(u log u log 5) time [36]. ■ 

Theorem 5.3. Let S be a set of n positive integers. One can compute the function 
0(nu log u log n) time. 

Proof: Partition S into two (roughly) equally sized sets Si and 82 . Compute W,Si and 
Nu^S 2 recursively, and combine them into Nu^s = ^u,Si © ^u,Si using Lemma 5.2, in 
0(ulogulog2"') = 0(nu log u) time. The final running time is then given by Observa¬ 
tion 2.6. ■ 


5.3.1. Power indices. 

The Banzhaf index of a set S' of n voters with cutoff u can be recovered from Nu^s iu linear 
time. The Theorem 5.3 yields an algorithm for computing the Banzhaf index in 0{nu) 
time. Previous dynamic programming algorithms take 0{nu) arithmetic operations, which 
translates to 0{n^u) running time [42]. Similar speed-ups (of, roughly, a factor n) can be 
obtained for the Shapley-Shubik index. 
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